APPROXIMATION OF PROJECTIONS OF RANDOM VECTORS 



ELIZABETH MECKES 

Abstract. Let X be a d-dimensional random vector and Xg its projection onto the span of 
a set of orthonormal vectors {^i, . . . , 9k]- Conditions on the distribution of X are given such 
that if is chosen according to Haar measure on the Stiefel manifold, the bounded-Lipschitz 
distance from Xg to a Gaussian distribution is concentrated at its expectation; furthermore, 
an exphcit bound is given for the expected distance, in terms of d, fc, and the distribution 
of X, allowing consideration not just of fixed k but of k growing with d. The results are 
applied in the setting of projection pursuit, showing that most fc-dimensional projections of 
n data points in R'^ are close to Gaussian, when n and d are large and k — CyJ\og{d) for a 
small constant c. 



1. Introduction 

There is a large class of results dealing with random variables (or measures) defined in 
terms of a parameter (say, a point on the sphere), which say that for a large measure of these 
parameters, the behavior of the random variable is well-approximated by some model distri- 
bution. Early work in this direction was done by Sudakov [21], who showed that under some 
relatively mild conditions, most one- dimensional marginals of a high- dimensional measure 
are close to each other. This line of research was further developed by von Weiszacker [23] . 
who showed that the canonical distribution around which one- dimensional marginals tend to 
cluster is close to a mixture of Gaussian distributions. In both [21] and [23], the results are 
about the limiting behavior of one- dimensional projections, as the ambient dimension tends 
to infinity, although von Weiszacker points out that one could extend the methods to deal 
with higher fixed- dimensional projections, as the ambient dimension tends to infinity. More 
recent work in this area was done by Bobkov [3], who obtained concentration results for the 
distance from a one-dimensional projection of an isotropic log-concave random vector to a 
Gaussian distribution. 

The purpose of this paper is to prove multivariate versions of such theorems; that is, to con- 
sider rank k projections of random vectors, instead of just rank one. Moreover, the approach 
yields results of a sufficiently quantitative nature to allow not only k fixed, but k growing 
with the ambient dimension. The general case of approximating random A;- dimensional pro- 
jections of probability measures on R'^ is considered, and is illustrated with an application to 
graphical projection pursuit. In particular, it is shown that typical fc-dimensional projections 
of n data points in R'^ are close to Gaussian for n and d large; the precise quantitative nature 
of the results yields limit theorems even for k = c\og{d) for a small constant c. This result 
generalizes the following univariate limit result of Diaconis and Freedman. 

Theorem 1 (Diaconis- Freedman [7]). Let Xi, . . . be deterministic vectors in R'^. Suppose 
that n, d and the Xi depend on a hidden index v , so that as v tends to infinity, so do n and 
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{j < n : \ \xj\'^ — cr^d| > ed} 



> 0, 



> 0. 



d. Suppose that there is a > such that, for all e > 0, 
and suppose that 

(2) {j,k <n:\ {xj, Xk) \ > ed} 

Let 9 G 8*^^^ he distributed uniformly on the sphere, and consider the random measure fi^ 
which puts mass ^ at each of the points {6, xi) , . . . , {9, Then as v tends to infinity, the 
measures tend to 9T(0, cr^) weakly in probability. 

The method of proof here is described in a fairly specific context: random measures 
indexed by points in the Stiefel manifold (one could equivalently take points in the Grassman 
manifold), approximated by Gaussian distributions. However, the approach is quite general 
and could in principle be adapted to a family of random measures indexed by points in 
a metric probability space possessing the concentration of measure phenomenon. Further, 
one could easily adapt the program to deal with non-Gaussian limits. In particular. Stein's 
method has been used to prove approximation results for many other limiting distributions, 

e. g. Poisson [5l[Tl[2]; gamma [H]; chi-square [18]; uniform on the discrete circle [6]; the semi- 
circle law [To]; the binomial and multinomial distributions [HI [13]; and the hypergeometric 
distribution [11]; these approaches could be combined with what is done here in order to 
approximate by non-Gaussian distributions. 

Before outlining the approach, some notation is needed. The Euclidean length of a vector 
X G R'^ is denoted \x\. For an n x n matrix M = [''T^ij]^ j^i^ the Hilbert-Schmidt norm is 
defined by 



\\M\\hs = Tt{MM^)= 



m%. 



The Wasserstein distance between two random vectors X and Y is defined by 

dw{X,Y)= sup |E/(y)-E/(X)|. 

{/:|/W-/(s/)l<k-y|} 

The bounded-Lipschitz distance is defined by 

dnL{X,Y):= sup |E/(5) - E/(r)|, 

il/ili<i 

where 

:=ma.{||/||^.=up^(£i^|. 

The class of m-times continuously differentiable functions on X C R*^ is denoted C™(X), 
and has a norm defined by 

ll/IU := sup sup \D^f{x)\ov 

0<k<m xeX 

Here, f{x) denotes the symmetric /c-linear form given in components by 
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where i/j = . . . For an intrinsic definition of D^f{x), see Federer [9]. The ball of 
radius R in C™(X) with respect to || • ||m is denoted C^IX). 
The Stiefel manifold 211^ k is defined by 



2n, 



d,k 



{0 



Ok 



with metric p{e, 6') = Y!1=i 1% - 



3/ 12 



1/2 



There is a unique rotation-invariant probability 

measure (Haar measure) on Wd^k'-, one way to construct it is by choosing 6i uniformly from 
S"^"^, then 62 uniformly from the orthogonal complement of 9i in S*^"^, and so on. 

Now, suppose that a family of random vectors Xe in R'^ is indexed by 6* G Wd,k- The 
following is an outline of an approach to show that most Xq are approximately Gaussian. 

1. Prove an approximation result for the average distribution. If Xq is defined fairly 
explicitly in terms of 9, one can first try to use the following abstract normal approxima- 
tion theorem to show that the average distribution of the Xg (averaged over 9 distributed 
according to Haar measure on Wd,k) is close to Gaussian. 

Theorem 2 Let X be a random vector in and for each e > let X^ be a random 

vector such that ^{X) = L{X^), with the property that lim^^oX^ = X almost surely. Let Z 
be a standard normal random vector in R^. Suppose there is a function A(e) and a random 
matrix F such that the following conditions hold. 

(i) 



A(6; 



■E[(X,-X),|X] 



-f -X. 



2A(e) 

(iii) For each p > 0, 



E [{X, - X){X, - X f\X] ^ a^Ik + E [F\X] . 



lim-^E 

e^o A(e) 



\X,- X\ 1{\X,- X\^ > p) 



0. 



Then 

(3) dw{X,aZ) < -nFWu.s. 

a 

It should be pointed out that while this theorem is sufiiciently general for the applications 
carried out here, there is a more general version (see or [ISj) allowing for approximations 
by Gaussian distributions with non-trivial covariance matrices. Furthermore, condition (i) 
need only hold approximately; see 

In order to apply this theorem, an auxiliary random variable Xq^^ must be constructed. 
A natural construction which makes use of the symmetry of 211^,^ is to let be a "small 
random rotation" of 9 (this is made explicit in the applications to follow). Then (6', 9^ is an 
exchangeable pair of random points of '^d,k by the rotation invariance of the distribution 
of 6*, and so the random variables {Xq.,Xq^ are also exchangeable and thus have the same 
distribution. Furthermore, as e — )■ 0, ~^ ^ almost surely, and so if Xq is a continuous 
function of ^, it will be true that Xq^ Xq almost surely. 
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2. Use the concentration of measure on W^^k to show that for some distance d{-,-), 
d{Xg,aZ) is close to its mean. It is shown in |17j that for ci = and C2 = |, for 
any F : Wd,k — ^ R with median Mp and modulus of continuity ujFiv)^ 

(4) P[\F{e,, ...,ek)-MF\> upirj)] < cie-^^"''^. 

Here, P is the rotation- invariant probabihty measure on 0i,...,9k described above. The 
median Mp is a median with respect to this measure. 

Again, if the random variable Xq is a sufficiently regular function of 6, this theorem can be 
applied to the function F{9) = dBLi^e, o'Z), where dBii^e, crZ) is the conditional bounded- 
Lipschitz distance from Xg to aZ, given 6. Standard arguments allow the median Mp to be 
replaced by the mean EF{6), with only minor loss. 

3. Use entropy methods to bound EdBLiXg,aZ). Consider the stochastic process Yf := 
\Exf{X0) — Ef{Xg) \ indexed by the class of functions {/ : ||/||i < 1} (or by some sub-class), 
where Ex denotes expectation with respect to X only; that is, conditional expectation with 
respect to the distribution of X, 9, conditioned on 9. Thus the bounded-Lipschitz distance 
from Xg (given 9) to its average distribution can be viewed as the supremum of a stochastic 
process. The same approach used to prove a concentration result for dBLi^e, can be 
used to show that Yf satisfies a sub-Gaussian increment condition of the type 



P[\Yf-Yg\ > e] <cie "-f-^nt , 

for some constants Ci and C2. For such a process, Dudley's entropy bound can be used to 
estimate its supremum. Specifically, Dudley showed the following. 

Theorem 3 (Dudley, [8J). Let {Xt}t&T be a stochastic process indexed by a metric space 
T with distance d. Suppose that there is a constant c such that Xt satisfies the increment 
condition 

( 

Vm, P \\Xt - X^\ >u\< cexp 



2d(s,t)^ 



Then there is a constant C such that 



POO 

Esnp Xt<C y/\ogN{T\d^de, 
taT Jo 



where N{T, d, e) is the e-covering number of T with respect to the distance d. 

One can apply this theorem not to the index set {/ : ||/||i < 1} (which has infinite e- 
covering number with respect to || ■ ||i for e < 2), but to a more restricted indexing set I? of 
test functions. One may then be able to obtain a bound on EdBLi^e, crZ) by approximation 
of functions / with < 1 by functions from 3^, together with the approximation for the 
average distribution proved in Step 1. 

2. Random Projections 

In this section, the method outlined in the introduction is applied in the case that X is a 
random vector in R'^, 9 = {9i, . . . , 9^) G QUd^fc, and Xq is the projection of X onto the span 
of 6*; that is 

Xe:={{X,9^),...,{X,9u)). 
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If 6 is chosen randomly from QU^^fc (according to the rotation-invariant probabihty measure 
described in the introduction), then the distributions of the Xq are a family of random 
measures on R'^ indexed by 6. 

To apply the method of the introduction, consider the random variable Xq defined above, 
in the case that 6 is chosen at random and independent of X. The following results describe 
the behavior of Xg, both on average and conditioned on 6. 

Theorem 4. Let X he a random vector in R", with EX = 0, E = a'^d, and E| |XpcT~^ — 

d\ := A < oo. If 9 is a random point ofWd,k cind Xg is defined above, 

ay/k{A+ 1) + ak 



dw{X0,(jZ) < 



d-1 



Theorem 5. Suppose that B is defined by B := SUp^ggd-l 

E(X,0 • ForOe Wd,k, let 

dBL{Xe,aZ)= sup \E[f{{X,e,),...,{X,9k))\e]-EfiaZ^,...,aZk)\; 

ll/lli<i 

that is, dBL{Xg, aZ) is the conditional bounded- Lip schitz distance from Xg to aZ , conditioned 



on 9. Then for e > STry -j, and 9 a random point ofWd,k 

P[\dBLiXg,aZ)-EdBLiXg,(jZ)\ > e] < ^ -6 32fl . 

Theorem 6. There is a constant C > 1 such that 

Ed^Ll^e, (yZ) < 2 H — . 

d^k+A d — i 

Observe that together. Theorems O and show that for e > ^ '"^^ + ^""^(-^+1)+^°"^ ^ 

P[dBL{Xg,aZ) > e] < \j\e-^. 

Note that the bound on the right tends to zero as d — ?■ oo for any e in this range. 
Proof of Theorem ^ Observe first that EXg = by symmetry and 

EiXgUXg), = E{9,, X) {9j,X) = 5^ E [9,r9,s] E [X,X,] = -^E [\X\^] = 5^,a^ 



r,s=l 



where the second-last equality follows from E [9ir9js] 

To apply the Theorem [2] to Xg, one first has to construct Xg ^. Let 



d^ij^rs 



A. 



vr 



— e 



l-d-2 



>d-2. 



where 5 = O(e^). Let U G be a random orthogonal matrix, independent of X, and 
define Xg^, := {{UA,U^9i,X) {UA^U'^9k,X)); the pair {Xg,Xg^,) is exchangeable by 
the rotation invariance of the distribution of 9, and so L{Xg) = L{Xg 
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Let K be the d x 2 matrix Efiven by the first two columns of U and let C = ^, ^ 

» —10 

define the matrix Q = \qijY = KCK^. Then, writing Xq = {Xf,...,Xl) and Xg^^ = 

E [x^ - x||x, ^] = E [{{UA,u^ - i)ej, X) |x, e] 

= eE [{Qej,X)\X,e] - ^^E[{KK^e,,X)\X,e] + 0{e''). 

Recall that Q and K are determined by U alone, and that U is independent of X, 6. It is 
easy to show that E[Q] = 0^ and E[KK^] = ^1^, thus 

E [Xe,, - Xe|X,^^] = -^Xe + O(e^). 



Condition (i) of Theorem |2] is thus satisfied with 5" = a{X, 9) and A(e) = ^. 

It is elementary but tedious to show that Egrs'Zto = d{d-i) \_^rt^sv — 5rv5st\ (the computation 
is carried out in detail in Making use of this yields 

^[iK3-x',){x%-x',)\x,e] 

= e=^E [{Q9j, X) {QOe, X) |X, 9] + 0(6^) 



5^ E [qrsqtvejsei.XrXt\X, 6] + ^(e^ 



r, 5,4,11=1 

d{d-l) 
d{d-l) 



r.s=l 



r,s=l 



[5,,|Xr-X|xa+0(e^ 



2e2 



The random matrix F of Theorem [2] is thus defined by 



F = j-^ [(|Xp - a^d)h + a^h - XgXj] . 



It follows from the theorem that 

dw{W,aZ) < -E\\F\\h.s. 



(5) 



< 



< 



a 
d-1 



E 



|xp 



+ 1 



o-V^(A+ 1) +crA: 



d-1 



□ 
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Proof of Theorem Define a function F : W^^k — R by 

F{e)= sup \ExfiX0))-Ef{aZ)l 

ll/lli<i 

wliere Ex denotes tlie expectation witli respect to tlie distribution of X only; tliat is, 

Exf{Xe) = E[f{Xe)\e]. 

To apply the concentration of measure on Wd,k, it is necessary to determine the modulus 
of continuity of F. First, observe that for / with ||/||i < 1 given, 

\Exf{Xe)-Ef{aZ)\ - \Exf{X',)-Ef{aZ) 

< 



E 



ExfiX',) - ExfiXe)) 

/( {x,e[) ,...,{x,e',))-f{ {x,e^) , . . . , (x,^) ) 



< E[\{{X,e[-9^),...,{X,9',-9k)) 



9,6' 



9,9' 



< 



It follows that 

dsLiXg, aZ) - dBL{Xg>,aZ) 



sup \Exf{Xe)-Ef{aZ)\- sup |Ex/(XeO - E/(aZ)| 
l/lli<i ll/lli<i 



< sup 

!l/lli<i 



ExfiXg) - Ef{aZ)\ - \Exf{Xe') - Ef{aZ)\ 
< p{9,9')y/B, 



thus d-BLiXQ, aZ) is a Lipschitz function on Wk,d, with Lipschitz constant y/B. Applying 
the concentration of measure inequality from Inequality (jl]) of the introduction then implies 
that 



P[\F{9i,...,9k)-MF\>e] < J^e-w. 



Now, if 6* = (6'i, . . . , ^fc) is a Haar-distributed random point of Wd,k, then 



\EF{e) - Mf\ < E\F{9) - Mf\ 



P 



\F{9) - Mi^l > t 



dt 



n dt'' 
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So as long as e > 2^^-^, replacing the median of F with its mean only changes the constants: 
?[\F{d) - EF(0)| > e] < f\\F{9)-Mf\ >e-\MF- EF(^) 

< P 



\F{e)-MF\>- 



2 



□ 



What has just been shown is that dsLiXg, aZ) is concentrated about its mean; it remains 
to give a bound for this mean (Theorem [6]). 

Proof of Theorem O As indicated in the introduction, Theorem [6] is proved making use of 
Dudley's entropy bound for bounding the expected value of the supremum of a stochastic 
process. Let Xj := \ExfiXe) — Ef{Xg)\. Then {Xf}f is a stochastic process (each Xf is 
a random variable depending on 9) indexed by a family of functions /. The same type of 
concentration argument used above can be used to show that this process is sub-Gaussian. 

Let / : [R^ — )■ [R be Lipschitz with Lipschitz constant L and consider the function G = Gf 
defined on 211^^^ by 

G(0i, ...,e,) = ExfiXe) = E [/((^i, X) , . . . , {6,, X))\e] . 

The same argument as above shows that G is Lipschitz on S'^^^ with Lipschitz constant 
L\/B. It thus follows from (HI) that 



P[\G{9)-MG\>e] < 



TV d'r 
— e IUb 
2 



and if e > 27r-^^, 
(6) 



P[\G{e) - EGie)\ > e] < 



2 



Observe that, for 9 a Haar-distributed random point of Wa^k, EG{9) = E/(Xe), and so ([6]) 
can be restated as 

P[Xf>e]<J-exp 



Note that 
1^/ 



X, 



\Exf{Xe)-Ef{Xe)\-\Exg{Xg 



< 



Exif-gKXg 



^Eg{Xg)\ 
nf-9)iXe 



X 



f-9-' 



thus for e > 4:n:L{f — g)J ^, for L{f — g) the Lipschitz constant oi f — g, 



P [\Xf -X,\>e]<P [Xf_^ >e]< exp 



-de"" 



[nL{f-g)]^B\ 



< 1 / — exp 



-de' 



-9\\iB\ 



The condition on e may be removed by replacing the factor of in the bound above by 



e.g., 3a/|^. The process {Xj} therefore satisfies the sub-Gaussian increment condition for 



the distance d*{f,g):=^\\f-g\\. 
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Consider the class C]^{Bji) of functions / which are supported on Br :— {x e R'^ : \x\ < R} 
such that ll/llm '■= ^^Po<j<m^^Px€Bii II /(^) II op < 1- It is proved in the appendix that for 
e < 2 and m >2, the e-covering number for this set with respect to the norm jj • jji is bounded 
by 



exp 



(31og(5) 



m 



m — 1 



log(e) + 



gm-l 



with 



Cl 



2tt''/\R + l)'=((m + 4) log(2)) (s) "'-^ 



It follows that the e-covering number with respect to the distance d* is bounded by 



exp 



31og(5) 



m 



\ 



(m — 1 



log 



8^ / 



21og(2)(m + 4) [y/^{R+l)Y 



k 

m — l 



kT (I) \eVd 



k 

m — 1 



Since functions / e C'^{Bji) have in particular ||/||i < 1, this class also satisfies the 
sub-Gaussian increment condition with respect to the metric d*. Note that the diameter of 

C^{Br) with respect to d* is bounded above by 16a/^. It follows from Dudley's entropy 



bound that there is a constant C such that E 



sup Xf 



is bounded above by 



fc + m — 2 

Ce 2 



16 



i 



31og(5) 



log 



2log{2){m + 4)[V^{R+l)f 



40VB 



k 

m—l 



(m-1) ^\8y/B 



kr (I) \eVd 



k 

m—l 



Making the substitution s — |^ then gives an upper bound of 



, fc+m-2 / B 

C e 2 



3 ,og(5) - ^ log(.) + 21°g(2)(m + 4)[Vi(ii+l)f(5)^^^ 



m — l 



^r(l) 



k 

Sm-1 



for another constant C. Looking at the first two summands and the third separately, as long 



as m > 2+1) this implies that there is an absolute constant C such that E 
bounded by 



sup Xf 
feC]-{BR) 



IS 



d 



{2m-k-2)JkT (I) 



-de. 
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or, as will be needed in what follows, 



(7) 



E 



sup Xf 






- \ 


/ d \^ 







{2m-k-2)^kV (I) 
From this bound, one can obtain a bound on ^dBhiXg^ ) as follows. Let 



1 



R+l-\x\ R<\x\<R + l, 
i? + l<|x|: 



that is, ifR is a radially symmetric cut-off function with ||¥?_r||i < 1, supported on -B/j+i and 
with = 1 on Br. For / G CliV^), let fn := f ■ ^r. Then 



II/rIIi = max <^ sup |/(x)v3r(x)|, sup |/(a;) ■ W^pr^x) + LpR{x)Wf{x)\ ) < 2. 
Since - fR{x)\ =QiixeBR and - fR{x)\ < 1 for all x e R^ 



1 

\Exf{Xe)~ExfR{Xe)\ < P[\Xg\ > R\e] < _ ^ e[ (X, ^^,)' ] 



< 



1=1 



Bk 



and the same holds if Ex is replaced by E. It follows that 
(8) \Xf-XfJ<^-^. 

Next, let ■?/' : [R — > [R be a C°° bump function, such that < ip{y) < 1 for all y, il){y) = 1 
for — 1 < 2/ < 1, 'ipiy) = for \y\ > 2, and such that 



(9) 



dy- 



-iy) 



<C^J 



2j 



for all J € N (the existence of such a function is guarranteed by Theorem 1.4.2 of [12]). For 
X e R'', define 



where C{k) is a constant depending only on k, such that J^^. ipt = 1- Observe that it follows 
from the bounds (191) that 



(10) 



\\D^Ux)U<^^^^^l{t<\x\<2t). 



For g G C\{R^), let gt{x) := g * ifjt{x). Let 1^ be a random vector in R*^ with density tpt, 
independent of X, 9. Then one can write 

Exgt{Xe) = Exg{Xe + Yt), 

and the same with E in place of Ex- Since g G CKR'^), it follows that 

max {\Exg{Xg) - Exgt{Xg)\, \Eg{Xe) - Egt{Xe)\) < 2E\Yt\ < At, 

from which it follows that 

(11) \Xg-X,,\<8t. 
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Furthermore, by Young's inequality, for j < m, 

11 .Ml / 11 11 f llr^,-,.Ml ^ ^2Cik)Cif^ , 2^+27r^/2C(A:)C^j2i 
\\D^gtix)\\op < Mlooj^^ \\D^My)\\opdy < 'J^^ yo\{B2t) = ^^.^^ 

Now, integrating in polar coordinates, 
1 



11 



It follows that 



t J r (I) io ^ ^ 



r (I) Jo 



(I) • 



\\D^9t{x)\\ov< 



2k+l(jjj2j 



p 



for all X G R^, and so \\gt\\m < 



2k + l(jm^2m 



. Finally, if g is supported on -B/j+i, then it is easy 



to see that gt is supported on i?_R+i+2t- 
It now follows from ([7]), ([H]) and ([nD that 



E 



sup Xf 

/eCi(R'=) 



(12) 



<E sup [\Xf-Xf^\ + \Xf^-X(j^)]+X^f^^:\ 
2Bk [B 1 2^+^m'^"'C^^'^{R + l + 2tY/'^m^l^ 

-^ + ^'^ + V7 

2Bk [B 



{2m - k - 2)t"' ^ kV (|) 

(^fc+m^2m^fc/2^3/2 



Choosing t = ^ yields 



(13) 



E 



sup Xf 

/eci(R'=) 



< 



(2m-A;-2)f"JA;r(|) 



{2B + 8)k [B (^fc+m^2m^2m+fc/2^3/2 



^ ^ ^ (2m - A; - 2)A;"^y^A;r (I) 
Now choosing m = k and applying Stirling's formula to F (|) yields 



(14) 

Setting R 
(15) 



E 



sup Xf 

/6CJ(R'=) 



25 + 8U; IB 
< — + \ — 



yields 



E 



sup Xf 



< 



2 

C^9fc+4 



Finally, by Theorem H] and (fT5|) , 



E(iBL(-'^0, o-Z) < E I sup 



\Exf{Xe) - Ef{Xg)\ + \Ef{Xg) - Ef{aZ)\ 



< ^''^ + oVkiA + l) + ak 



d-l 



□ 
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3. Application: Projection Pursuit 



In this section, the theorems of the previous section are apphed to prove a quantitative, 
higher-dimensional version of a resuh of Diaconis and Freedman [7] . Let be deter- 

ministic vectors in R'^; write Xi = (xj^i, . . . , Xi^d)- Define cr > by the condition ^ = 
a'^d, and define A and B by A := - J2'i=i 



- d\ and B := snpg^^d-i ^ ^1^=1 ^iY 



Observe that ^ ^ B < a'^d. Also, if X is distributed uniformly over the points {xj}, then 
these definitions of a. A, and B correspond to those in the previous section. 

Let 6^ = (6'i, . . . , ^fc) be a random point in Wd,k, distributed according to the rotation- 
invariant probability measure described in the introduction, and consider the family of ran- 
dom measures /i^ ^ ^ defined in terms of 6 by 

1 " 

/^M,fc := -X^^((ei,a;,>,...,(efc,x.»- 
i=i 

That is, iJin,d,k P^ts equal mass at the projections of each of the Xj onto the span of ^i, . . . , 6*^. 

In Diaconis and Freedman |7] it was shown that, in the case /c = 1, the measures /x^^i 
converge weakly in probability to Gaussian as n and d tend to infinity, under the conditions 
that, for some cr^ > such that, for all e > 0, 



(16) 
and 
(17) 



{j <n: \ \xj 



- cr^d| > ed] 



0, 



{j,k < n : I {xj,Xk) | > ed} 



^0. 



Here, n, d, and the Xi depend on a hidden index u such that as i' tends to infinity, so do n 
and d. A reasonable quantitative analog would be to require A and B above to be bounded, 
independent of n and d. One could also allow them to grow slowly, as is clear from the 



statements of the theorems below. Recall that B > 



SO if B is to remain bounded as d 



tends to infinity, n must tend to infinity at least as fast as d. 

In recent work of the author [16|, a quantitative version of the Diaconis- Freedman result 
was proved, giving an explicit bound on P [dBL{f^n,d,i^lo-'2) — ^] ' where 70-2 is the Gaussian 
distribution on R with mean zero and variance a^. The results of Section [2] apply immediately 
to the random vector X uniformly distributed on the n points {xi}^^i to give the following 
A;- dimensional extensions. 

Theorem 7. If 9 is a random point ofWd,k and Xq is distributed according i4i4,k' then 

(j^/k{A + l) + ak 



dwiXe,aZ) < 
Theorem 8. For 9 eWdk, let 



d-l 



dBL{.X0,aZ) = sup 

ll/lli<i 



1 " 

- V / ((x„ 5i) , . . . , (x„ 9k)) - E/(aZi, . . . , aZk] 
n ^ — ^ 

i=l 
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that is, dshiXe, crZ) is the conditional hounded- Lip schitz distance from Xq to aZ , conditioned 

B 

d ■■ 



on 9. Then for e > 2n\/ ^, and 9 a random point ofWdk 



P [\dBL{Xe,aZ) - EdBL{W{9),aZ)\ > e] < \J\e-^ . 



Theorem 9. There is a constant C > 1 such that 



C'B aVk{A + l) + ak 
taBL[Xe,crZ) < 2 H . 

(lok+4 d — i 

Observe that together, Theorems [8] and [9] show that for e > ^ '"^^ + °'^('^-+^)+'^^ _ 



dSk+4 



P[dBLiXe,aZ) > e] < ^^e'^. 

Note that the bound on the right tends to zero as — )■ oo for any e in this range. In 
particular, if A and B are bounded and e > fixed, if A; = clog{d), where c is a sufficiently 
small constant (depending on e), then P [dBL^Xe, aZ) > e] decays exponentially as d tends 
to infinity. 

4. Appendix: The covering number of the class C]^(X) 

Consider the class C™'(X) of m-times continuously differentiable functions on X C P'^ with 
norm defined by 

sup sup II ^^/(x) Hop. 



0<A;<m i'GX 

Here, f{x) denotes the symmetric fc-linear form given in components by 

where yj = (yj, . . . For an intrinsic definition of D^f{x), see Federer [9]. 

Let C]^(X) be the ball of radius M of C""(X) with respect to || ■ H^; in this section, the 
e-covering number of C{"(X) with respect to the norms || ■ ||oo (defined the usual way; in 
our notation, this is || ■ ||o) and || • ||i is calculated for m > 2. The proof closely follows the 
approach in van der Vaart and Wellner [22j but uses the definition of D''f as a /c-linear form 
instead of working in coordinates with the partial derivatives of /. 

First, choose a 6-net {yi}^^i of X, with 6 = 6{e) to be determined. One can choose such 
a net so that n < , where Xi := {x E : inf^gx \x — y\ < 1}. Now, associate to 

each / G C{"(X) an (m — 1) x n array of operators in the following way. In the space of 
symmetric fc-linear forms on R"', choose a ^ — net {Ti}f£i, with respect to the operator 
norm. The (z,j)-th entry of the array Af associated to / is chosen to be the closest point 
in the appropriate net to the ^-linear form D*/(?/j). One can choose and 6i such that if 
f,gE C™(X) have Af = Ag (with respect to either the or the nets), then ||/ — (7||oo < e 
for So and ||/ — (7||i < e for 5i, as follows. For x G X given, choose yi with \x — yi\ < 6. By 
Taylor's theorem applied to f — g, 

m— 1 

if -9){x) = J2y_ {^^^f " 3){yi). ix-y,,...,x- yi)) + R, 

k=0 
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with \R\ < < Since Af = Ag, it follows that \\D^{f - g){yi)\\op < S"^'^ for 

1 < /c < m — 1, thus by the expansion above, 



i(/ - ^)(^)i < E uW^'^f - 3)(y^)\\op + ^ 

^-^ kl ml 

k=0 



kl ml 

k=0 



< 5<5" 



since m > 1. It follows that choosing Sq = (|) means that if Aj = Ag then ||/ — S'Hoo < 
To choose 6i, apply Taylor's theorem to D{f — g): if \v\ = 1, 



m— 1 ^ 

{D{f -g){x),v) = J2- {D\f - g){y,), [x - y,, . . . ,x - y,, v)) + i?, 
fc=i 



(with X — yi occurring k — 1 times), and \R\ < '^^^ — — < As above, this implies 



m~l rfc — 1 

\D{f - g){x)\ < - 9M)\\op + S^-' < 55' 

k=l 



m—1 



and thus ||/ - < e if (5i = (f)~ • 

To bound the size of an e-net for C{"(X), it now only remains to count the number of 
possible arrays Aj for / G C^{X). Begin by counting the number of possibilities for the first 
column. Since D''f{yi) is approximated in the k-1 entry of by a point from a ^""^ '' -net, the 
size of such a net is needed. The space of symmetric A;-linear forms is a finite-dimensional 
normed space, and the size of a net for the unit ball of such a space is given in Milman 
and Schechtman ^7\, in terms of the dimension of the space. To define an element T of 
this space, it suffices to define T(ei, . . . , ei, . . . , e^, . . . , e^), where ej appears kj times with 
kj > for each j and X]j=i — ^- The number of such vectors (kj) is well-known (see, 
e.g., |20j) to be {^^fr^)- It follows from the bound in |T7] that there is a ^^^-net of the 

space of symmetric /c- linear forms of size not greater than (l + jS^k) ^ ^ < (^;Jrfc) ^ ^ ^ 
assuming that S < 1. Since the only interesting case is e < 2 (since ||/ — (^Hi < 2 for i = 0,1 
automatically), this is no restriction. The number of possibilities for the first column of ^4/ 
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n 

fe=0 



fc=0 



(171 + ^-2)" 

5 \ fe. 



^m—k 



v^m-l (m+d-2)'' 
g \ Z^fc=0 fc! 



-1 ,Jm + d-2) 



Em— 1 7^ y rn-t-a 
k = k 



k 



< 



5 



^m+d — 2 



m+d-2 



= exp [(log(5) - m\og{5)) e"'+'^~^ + {m + d-2) \og{5)] 
< exp [(log(5) - m\og{5)) e"'^'^-^] , 

since 6 < 1 and m+d— 2 > 0. To bound the number of possibihties in the remaining columns, 
assume that the yi have been ordered such that for all j > 1, there is an z < j with \yi — yj\ < 
25. Now, for unit vectors vi, . . . ,Vk Q R''', define the function F{x) :— {vi, . . . ,Vk)) , 

where the dependence of F on the Vi has been suppressed. By Taylor's theorem, 

m—l—k ^ 

^iVj)^ -^{D'F{yi),{yj-yi,...,yj-yi)) + R, 



1=0 



with \R\ < j^zm'- Let Af{i,j) denote the i-j-th entry of the array Af. Then 



— {m-ky. 



m—l—k 



^iVj)- ^ j^{Afihk + i),{yj-yi,...,yj-yi,vi,...,Vk)) 



m—l—k 



m—l—k ^ 

Y j\ {^^^''fivi)' -yh-'-^Vj- vi,..., Vk)) 

m—l—k ^ 
m—l—k ^ 



(2<5) 



m—k 



< (25)-^ (l + 



That is, given the information in the previous columns, the symmetric A;-linear form T{vi, . . . , Vk) :— 
(^D'' f{yi),vi, . . . , Vk)) is within a ball of radius (25)™^''" (l + |) with respect to the operator 
norm. By the same argument that bounds the size of the original ^^^-net in the space, the 
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number of points of the net within this ball of radius (25)"^"*^ (l + |) is bounded by 



(k+d-l\ 



It follows that the number of possibilities for the column entries oi Af after the first column 
is specified is bounded by 



m— 1 



/■k + d-l\ 
-fc+4\ [ k > 



k=0 



exp 



< exp 



S'i ^ \ k 

k=0 



vol(Xi)((m + 4)log(2)) -^ 1 
Sd Z^^(^ + ^-2) 



< 



exp 



vol(Xi)((m + 4)log(2)) 
Sd 



k=0 



d+m-2 



It now follows that the total number of possible entries of A/ is bounded by 



exp 



1 lr,^ 1 , vol(Xi)((m + 4)log(2)) . 
log(5) - m\og{5) H ^ — ^ — ^ I e''+'" ^ 



Recall that 5q and 5i were chosen such that = (f ) ^-iid 5i = (|)™-"^. The e-covering 
number (for e < 2) of C™(X) with respect to || ■ ||oo is thus bounded by 



exp 



21og(5)-log(e) + ^) e'+--' 



dm 



with 



Co = vol(Xi)((m + 4) log(2))(5)-. 
The e-covering number of C™(X) for e < 2 and m > 2 with respect to || • ||i is bounded by 



exp 



(31og(5) - 3:^log(e) + ^ \ e'^+— 2 



m — 1 



d 

em-1 



with 



ci = vol(Xi) ((m + 4) log(2)) (S) ""^ . 
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